Deterministic-Probabilistic Models For Partially Observable Reinforcement Learning Problems

نویسندگان

M. M. Hassan Mahmud

John W. Lloyd

چکیده

In this paper we consider learning the environment model in reinforcement learning tasks where the environment cannot be fully observed. The most popular frameworks for environment modeling are POMDPs and PSRs but they are considered difficult to learn. We propose to bypass this hard problem by assuming that (a) the sufficient statistic of any history can be represented as one of finitely many states and (b) this state is given by a deterministic map from histories to the finite state space. This finite set of states can be interpreted as the state space of an MDP which can then be used to plan. Now the learning problem is to estimate this deterministic history-state map. One of the earliest approaches in this direction is McCallum’s USM algorithm. Our work can roughly be understood as extending this general idea by replacing prediction suffix trees, used in USM, with deterministic-probabilistic finite automata from learning theory. In this paper we describe our model, derive a pseudo-Bayesian inference criterion, and show its consistency. We also describe a heuristic algorithm that uses the criterion to learn the models, along with experiments showing its efficacy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Partially Observable Action Models

In this paper we present tractable algorithms for learning a logical model of actions’ effects and preconditions in deterministic partially observable domains. These algorithms update a representation of the set of possible action models after every observation and action execution. We show that when actions are known to have no conditional effects, then the set of possible action models can be...

متن کامل

Learning Partially Observable Deterministic Action Models

We present the first tractable, exact solution for the problem of identifying actions’ effects in partially observable STRIPS domains. Our algorithms resemble Version Spaces and Logical Filtering, and they identify all the models that are consistent with observations. They apply in other deterministic domains (e.g., with conditional effects), but are inexact (may return false positives) or inef...

متن کامل

Probabilistic and Decision-Theoretic User Modeling in the Context of Software Customization

Research in the field of user modeling has aimed to supersede the current “one-size-fits-all” trend in software development that forces users to change their behaviour according to the preprogrammed functions. This paper discusses aspects of user modeling and its relevance to software customization. In particular, we focus on user modeling techniques that utilize probabilistic and decision-theo...

متن کامل

Reinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs

We present a probabilistic logic programming framework to reinforcement learning, by integrating reinforcement learning, in POMDP environments, with normal hybrid probabilistic logic programs with probabilistic answer set semantics, that is capable of representing domain-specific knowledge. We formally prove the correctness of our approach. We show that the complexity of finding a policy for a ...

متن کامل

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

Increasing attention has been paid to reinforcement learning algo rithms in recent years partly due to successes in the theoretical analysis of their behavior in Markov environments If the Markov assumption is removed however neither generally the algorithms nor the analyses continue to be usable We propose and analyze a new learning algorithm to solve a certain class of non Markov decision pro...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Deterministic-Probabilistic Models For Partially Observable Reinforcement Learning Problems

نویسندگان

چکیده

منابع مشابه

Learning Partially Observable Action Models

Learning Partially Observable Deterministic Action Models

Probabilistic and Decision-Theoretic User Modeling in the Context of Software Customization

Reinforcement Learning in Partially Observable Markov Decision Processes using Hybrid Probabilistic Logic Programs

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems

عنوان ژورنال:

اشتراک گذاری